Skip to content

feat: enabling dnstt#497

Draft
WendelHime wants to merge 16 commits into
mainfrom
feat/enabling-dnstt
Draft

feat: enabling dnstt#497
WendelHime wants to merge 16 commits into
mainfrom
feat/enabling-dnstt

Conversation

@WendelHime
Copy link
Copy Markdown
Contributor

@WendelHime WendelHime commented May 28, 2026

This pull request introduces significant improvements to the DNSTT (DNS tunnel) transport in the kindling subsystem, enhances reliability and performance under censorship, and updates several dependencies. The DNSTT transport is now enabled by default, features more robust tunnel management and probing, and better integrates with the system's bypass logic. Additionally, HTTP timeouts have been increased, and dependencies have been updated for improved stability and compatibility.

DNSTT Transport Improvements:

  • The DNSTT transport is now enabled by default in EnabledTransports, making it available for use in production scenarios.
  • The DNSTT transport now creates and probes tunnel instances on demand, rather than at startup, reducing resource usage and improving resilience. Tunnel management has been refactored to track consecutive failures, discard failing tunnels, and periodically probe for working tunnels. [1] [2]
  • The DNSTT transport now uses the bypass.DialContext dialer, ensuring that DoH/DoT connections do not loop back through the VPN tunnel.
  • Added RequestTimeout and MaxLength methods to DNSTT transport, tuning request timeouts and maximum body size for the high-latency, low-MTU nature of DNS tunnels.
  • Improved tunnel cleanup and cancellation logic, ensuring that probe workers are promptly stopped and successful tunnels are properly closed.

Kindling Client and Transport Integration:

  • The logic for enabling and configuring transports in NewKindling has been updated, moving the Smart transport configuration before DNSTT and ensuring correct option ordering. [1] [2]

General Configuration and Constants:

  • Increased the default HTTP timeout from 60 seconds to 360 seconds to accommodate slow connections, especially over DNS tunnels.

Dependency Updates:

  • Updated several dependencies in go.mod, including github.com/getlantern/dnstt, go.opentelemetry.io/otel, golang.org/x/crypto, and others, to newer versions for improved performance and compatibility. [1] [2] [3] [4]
  • Updated test assertions in parser_test.go to reflect the refactored DNSTT transport internals (configs instead of options). [1] [2] [3] [4]

Code Structure and Typing:

  • Added a type alias for DNSTT in the parser to simplify imports and usage.
  • Refactored tunnel configuration and resolver logic for clarity and maintainability.

These changes collectively make the DNSTT transport more robust, production-ready, and easier to maintain, while also keeping the codebase up-to-date with dependency improvements.

WendelHime added 16 commits May 28, 2026 12:30
  - Goroutine storm fixed: DNSTT instances now created lazily inside pool workers (max 10 at a time = 320 goroutines, down from 5,216)
  - RT caching: probe's http.Transport reused for requests, avoiding a second CONNECT tunnel that causes 502 on the same smux session
  - probeCancelFn: Close() cancels in-progress probe workers promptly so their DoH goroutines don't starve subsequent probes
  - waitFor raised to 2 minutes to allow full session establishment (~20-60s)
  - DNSTT type alias exported so callers don't need a direct import of github.com/getlantern/dnstt
also running go mod tidy
Replace instant-death markFailed with a softer consecutiveFailures
counter (5 strikes). When a tunnel fails, clear the cached probe
round tripper so the next attempt gets a fresh smux stream instead
of reusing a degraded transport. Close permanently dead tunnels
instead of leaking them via 'continue'.

isSucceeding now requires both that the tunnel has ever succeeded
AND is under the failure threshold — the old || logic would
recycle tunnels that had never worked.

Add an on-demand probe trigger: when NewRoundTripper exhausts all
usable tunnels it signals probeCh, which findWorkingDNSTunnels
picks up to start a fresh cycle immediately instead of waiting
for the timer. Reduce the probe interval from 30 min to 5 min so
re-probes happen even without a trigger signal.

Also wire domain/resolver into the dnsTunnel struct so RoundTrip
logs identify which tunnel handled each request.
DNSTT sessions run over DNS at 135-byte MTU; a single TLS handshake
can take 30-60 seconds through the tunnel. The race transport's
default 80s GET budget and 3min POST budget are often too tight.

Add RequestTimeout() returning 5 minutes so kindling's race transport
gives this transport enough time to complete. The upstream kindling
change (../kindling) wires this into the race budget.

Also replace the kindling dependency with a local path so these
changes take effect together.
The config fetcher and kindling HTTP client both use this timeout
as an outer bound around the race transport. With DNSTT requesting
a 5-minute race budget, the outer timeout of 180s was killing
requests before the tunnel could complete — response headers were
arriving but body reads were cut off mid-stream.

360s gives DNSTT 5min with 1min of headroom.
Removed the probeRT field from dnsTunnel and all related methods
(setProbeRT, getRoundTripper, clearProbeRT). The probe still validates
the tunnel but no longer caches its smux session — each real request
now creates a fresh round tripper via NewRoundTripper.

This fixes the 'io: read/write on closed pipe' cascade: when multiple
concurrent requests shared the probe's cached smux session, a single
KCP disconnect killed all streams simultaneously. With per-request
sessions, each request has isolation — one dying session doesn't
take down others.
Sets a 10KB body size cap on the DNSTT transport so the race transport
skips it for oversized payloads (e.g. 1.4MB issue report protobufs).
DNS tunnels have a 135-byte MTU — 10KB already needs ~75 DNS queries
and nearly exhausts the 5-minute RequestTimeout budget. Larger payloads
should route through another transport.
Moves DNSTT registration after Smart/AMP/Domainfront so it only races
if the other transports fail to connect. Also re-enables all transports
now that DNSTT is no longer the sole transport.
Previously, NewRoundTripper returned a connectedRoundtripper without
establishing any connection — the actual smux session was created lazily
in RoundTrip. This meant DNSTT's NewRoundTripper returned instantly and
always won the connection race against Smart/AMP/Domainfront that actually
pre-connect.

Now NewRoundTripper calls tun.NewRoundTripper during the connect phase so
DNSTT pays the real connection cost (KCP + smux handshake) during the race.
connectedRoundtripper stores the pre-built RoundTripper and uses it directly.
When a tunnel fails to create a round tripper, the DNSTT instance's
goroutines (KCP poll loop, DNS senders) would keep running since the
tunnel was consumed from the channel but never closed. This leaked
resources and shrunk the available tunnel pool until the next probe
cycle. Now closed promptly.

Also add blank line separator between immutable fields (domain, resolver)
and mutex-guarded fields (lastSucceeded, consecutiveFailures) for clarity.
Comment thread common/constants.go
Name = "lantern"

DefaultHTTPTimeout = (60 * time.Second)
DefaultHTTPTimeout = (360 * time.Second)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used by more than DNSTT, so we should probably define a separate variable for DNSTT timeout.

Comment thread kindling/dnstt/parser.go
pool := pond.New(poolSize, 10, pond.Context(pondCtx))
for _, dnst := range m.options {
for _, cfg := range m.configs {
cfg := cfg
Copy link
Copy Markdown
Collaborator

@garmr-ulfr garmr-ulfr Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg := cfg not necessary since go v1.22.

Comment thread kindling/dnstt/parser.go
// TLS handshake through 135-byte MTU tunnel (multiple round trips).
client := &http.Client{Transport: rt, Timeout: 180 * time.Second}

req, err := http.NewRequestWithContext(pondCtx, http.MethodGet, "http://www.gstatic.com/generate_204", http.NoBody)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to change to HTTP instead of HTTPS? Since one issue was that tunnels timeout/fail during the TLS handshake (due to transfer speed, packet size, etc.), we should use a request equivalent to what will be sent when actually running. Otherwise, the test is kind of meaningless.

Copy link
Copy Markdown
Contributor Author

@WendelHime WendelHime Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've changed to HTTP because there was too many failures with dns tunnels while using HTTPS for testing and this test basically create a session and keep it open to further requests to just use the same open session

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK. Is that already documented somewhere? If not, would you add a comment in the function doc explaining that this is used to eagerly establish connections?

Comment thread kindling/dnstt/parser.go

m := &multipleDNSTTTransport{
tunChan: make(chan *dnsTunnel, 400),
tunChan: make(chan *dnsTunnel, 400),
Copy link
Copy Markdown
Collaborator

@garmr-ulfr garmr-ulfr Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400 is kind of high, especially given the recent iOS memory issues. Since tunChan holds successful tunnels, we only need to retain a few, <10. Once we find enough working ones we can stop testing the rest, then only we start testing again from were we left off to find replacements for ones that stop working upto the max.

Comment thread kindling/dnstt/parser.go
Comment on lines +356 to +359
tun.markSucceeded()
if !m.closed.Load() {
m.tunChan <- tun
} else {
Copy link
Copy Markdown
Collaborator

@garmr-ulfr garmr-ulfr Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds and adds a new DNSTT instance for successful ones but they're only removed when they fail. So eventually tunChan will fill up and no more can be added, resulting in this blocking until one does fail and is removed and leaking if that doesn't happen. We should skip finding new tunnels if we have some working ones already (5 is reasonable) and only replace broken ones. If we want to establish new tunnels when we receive a new config, we should completely replace existing ones, while not exceeding the max open tunnels.

Comment thread kindling/dnstt/parser.go
Comment on lines +444 to +446
// NewRoundTripper creates a pre-connected HTTP round tripper for the given
// address. It blocks until a KCP session and smux stream are established so
// that the race transport can fairly compare connection latencies.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It blocks until a KCP session and smux stream are established so that the race transport can fairly compare connection latencies.

The KCP session and smux streams are actually established lazily — on first request. So this isn't correct.

Comment thread kindling/dnstt/parser.go
Comment on lines +459 to +464
rt, err := tun.NewRoundTripper(ctx, addr)
if err != nil {
tun.recordFailure()
tun.Close()
slog.WarnContext(ctx, "dnstt roundtripper creation failed during connect",
"domain", tun.domain, "resolver", tun.resolver, "error", err)
Copy link
Copy Markdown
Collaborator

@garmr-ulfr garmr-ulfr Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewRoundTripper only returns an error if the dnstt instance is already closed. So there's no need to double-close it here or record the failure.

Comment thread kindling/dnstt/parser.go
Comment on lines +288 to +290
m.probeCancelMx.Lock()
m.probeCancelFn = cancel
m.probeCancelMx.Unlock()
Copy link
Copy Markdown
Collaborator

@garmr-ulfr garmr-ulfr Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryAllDNSTunnels is triggered by findWorkingDNSTunnels (immediately), NewRoundTripper (via probeCh), and by the timer, which can overlap. Since probeCancelFn is overwritten here each time, it can only cancel the most recently triggered probe. We should use a flag to signal that it's already running and prevent subsequent runs from being triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants